Time to go for a walk mate!




This article analysis pedestrian walking trend throughout the last 9 years -2013/2021-, using open data published by the City of Sydney Council.

1. Libraries and database

1.1 Loading libraries

For this analysis, we’ll be using the following libraries:

library(ggplot2)
library(knitr)
library(leaflet)
library(sf)
library(tidyverse)
library(lubridate) 
library(plotly)
library(gganimate) #GIF maker

#For any missing library, remember to <install.packages('PACKAGE-NAME')>

options(scipen=999) #turn off scientific notation

1.2 Loading data

Data available at:

df <- read.csv('data/Walking_count_surveys.csv', stringsAsFactors = TRUE, encoding = "UTF-8")

sites <- st_read('data/Walking_count_sites.shp') %>% 
  rename(SiteID=Site_ID) %>% 
  st_difference()
## Reading layer `Walking_count_sites' from data source 
##   `D:\Escritorio\Proyectos\Sydney_Walking-trend\data\Walking_count_sites.shp' 
##   using driver `ESRI Shapefile'
## Simple feature collection with 120 features and 4 fields
## Geometry type: POINT
## Dimension:     XY
## Bounding box:  xmin: 151.1779 ymin: -33.90846 xmax: 151.2323 ymax: -33.8544
## Geodetic CRS:  WGS 84

2. Brief insight

2.1 Pedestrian counting

Let’s have a look at the survey dataset…

head(df)
##   X.U.FEFF.OBJECTID SiteID    Location                              Description
## 1              6760      2 Botany Road Between Bourke Street and Hansard Street
## 2              6761      2 Botany Road Between Bourke Street and Hansard Street
## 3              6762      2 Botany Road Between Bourke Street and Hansard Street
## 4              6763      2 Botany Road Between Bourke Street and Hansard Street
## 5              6764      2 Botany Road Between Bourke Street and Hansard Street
## 6              6765      2 Botany Road Between Bourke Street and Hansard Street
##    Period Year  Month TotalCount               DateFilter Time_0600 Time_0700
## 1 weekday 2013 Spring       3504 2013 - Weekday - October       186       216
## 2 weekend 2013 Spring        834 2013 - Weekend - October        24        78
## 3 weekday 2014 Autumn       3300     2014 - Weekday - May       168       300
## 4 weekday 2014 Spring       2844 2014 - Weekday - October        78       204
## 5 weekend 2014 Autumn       1134     2014 - Weekend - May        54        78
## 6 weekend 2014 Spring        780 2014 - Weekend - October        24        30
##   Time_0800 Time_0900 Time_1000 Time_1100 Time_1200 Time_1300 Time_1400
## 1       582       258       126        42        30        36        78
## 2        72        36        84        66        42        54        30
## 3       504       336       132        42        36        48       150
## 4       204       114        90        84       300        66       126
## 5       108        72        78        72        66       132       114
## 6        78        72        48        72        66        84        12
##   Time_1500 Time_1600 Time_1700 Time_1800 Time_1900 Time_2000 Time_2100
## 1        96       258       462       372       294       234       174
## 2        24        54        90        60        18        42        24
## 3       156       306       522       198       138       132        84
## 4       264       282       540       162       162        54        54
## 5       126        78        24        48        30        12        12
## 6        66        42        42        18        54        36         0
##   Time_2200 Time_2300
## 1        36        24
## 2        30         6
## 3        24        24
## 4        42        18
## 5        18        12
## 6        12        24

Brief insight:

Each site contains the following information:

-   ID & location details

-   Period (weekeday / weekend)

-   Season of the year (fall / sping)

-   Total pedestrian counted per day

-   Total pedestrian counted per hour (6 am to 11 pm)
paste('The dataset contains information of ', length(unique(df$SiteID)), ' sites and ', length(unique(df$Location)), ' locations (clustered sites)')
## [1] "The dataset contains information of  122  sites and  62  locations (clustered sites)"

2.2 Sites

Now, a quick inspection of the geographical data we’ll be we working with:

kable(head(sites))
OBJECTID SiteID Location SiteDescri geometry
1 2 Botany Road Between Bourke Street and Hansard Street POINT (151.2029 -33.90732)
2 3 King Street Between Whitehorse Street and Newman Street POINT (151.1779 -33.89934)
3 4 William Street Between Crown Street and Palmer Street POINT (151.2167 -33.87438)
4 5 City Road Between Carillon Avenue and Forbes Street POINT (151.188 -33.8917)
5 6 Broadway Between Buckland Street and Abercrombie Street POINT (151.1977 -33.88434)
6 7 Railway Square Between Quay Street and Harris Street POINT (151.2031 -33.8837)

It contains information of:

-   ID & location details

-   Geometry
sites <- sites %>%
  select(SiteID, Location) %>% #IDs for joing data
  mutate(long = st_coordinates(.)[,1],
         lat = st_coordinates(.)[,2])

leaflet(sites) %>%
  addTiles() %>%
  addProviderTiles(providers$CartoDB) %>% 
  addCircles(lng = ~long, 
             lat = ~lat, 
             weight = 4,
             popup = ~paste('<strong>',"Location:",'</strong>', Location))

2.2 Joining data by ID

Many Locations are clustered under the same name. We’ll create a new tag to identify each one of them separately, using the siteID information

sites <- sites %>% 
    mutate(Location_site=paste(SiteID,'-',Location))

df <- df %>% 
  select(-Location) %>% #repeated in both sets of data
  left_join(sites, by='SiteID')

3. Active mobility trend - time analysis

3.1 Periods and days

Spring or Autumn? Weekdays or weekends?
df <-   dplyr::filter(df, !Year==2022) #Spring's not included yet!

days <- df %>% 
  group_by(Period, Month) %>% 
  summarise(Amount=n(),
            Avg=round(mean(TotalCount),2))

plot_ly(data = days,
        x = ~Month,
        y = ~Avg,
        color = ~Period,
        type = "bar") %>% 
   layout(title = 'Pedestrian behaviour - Periods and days', xaxis = list(title = 'Period'), 
         yaxis = list(title = 'Average amount of pedestrians'))

In average, there are more pedestrians counted during Autumn rather than Spring.

Weekdays are busier, specially when it comes to Autumn.

There’s a higher decline in the amount of pedestrians walking on weekends during Autumn, representing a drop of 22% of the corresponding amount for the same season during weekdays.

During Spring people tend to walk more on weekends compared to Autumn. For this case, the reduction represents 14% of what is counted on weekdays.

Seasons and locations
weekdays <- df %>% 
  dplyr::filter(Period=='weekday') #leaving the weekends behind 

weekdays_2 <- weekdays %>% 
  group_by(Location_site, Month) %>% 
  summarise(Amount=sum(TotalCount)) %>% 
  mutate(Perc=Amount/sum(weekdays$Amount)*100) %>% 
  left_join(sites, by='Location_site') %>% 
  arrange()

plot_ly(data=weekdays_2,
        x = ~Location_site,
        y = ~Amount,
        color = ~Month,
        type = "bar") %>% 
   layout(title = 'Pedestrian behaviour - Periods and days', xaxis = list(categoryorder = "total ascending"), 
         yaxis = list(title = 'Average amount of pedestrians')) 
spring_over_autumn <- weekdays_2 %>% 
  pivot_wider(names_from = Month, values_from = Amount) %>% 
  mutate(DIF_spring=Spring/Autumn*100,
         TIME=case_when(
           DIF_spring > 100 ~ "Spring",
           DIF_spring < 100 ~ "Autumn")) %>% 
  select(Location_site, DIF_spring, TIME)

qpal <- colorFactor(palette = c("#8da0cb", "#66c2a5"),
                               domain = c("Spring", "Autumn"))

sites %>% 
  left_join(spring_over_autumn, by='Location_site') %>% 
  
  leaflet() %>%
  addTiles() %>%
  addProviderTiles(providers$CartoDB) %>% 
  addCircles(lng = ~long, 
             lat = ~lat, 
             radius = ~sqrt(DIF_spring)*5,
             color = ~qpal(TIME),
             popup = ~paste('<strong>',"Location:",'</strong>', Location_site, '<br/>',
                            '<strong>',"Most frequent season:",'</strong>', TIME)) %>% 
    addLegend("bottomright", pal = qpal, values = ~TIME,
    title = "Busiest season",
    opacity = 1)

George Street is by far the most frequent location, followed by Park Street, Martin Place, Oxford Street and King Street.

weekdays %>% group_by(Location) %>% summarise(Amount=sum(TotalCount)) %>%
  arrange(-Amount)
## # A tibble: 62 x 2
##    Location           Amount
##    <chr>               <int>
##  1 George Street    10189412
##  2 Park Street       1655114
##  3 Martin Place      1270494
##  4 Oxford Street      977448
##  5 King Street        949544
##  6 Druitt Street      800502
##  7 Pitt Street Mall   774870
##  8 Railway Square     736068
##  9 Elizabeth Street   695310
## 10 Broadway           605376
## # ... with 52 more rows

3.2 Trend over the years

weekdays_year_LOC <- weekdays %>% 
  group_by(Location, Year) %>% 
  summarise(Amount=sum(TotalCount)) %>% 
  left_join(sites, by='Location') %>% 
  drop_na()
ggplot(weekdays_year_LOC) + 
  geom_point(aes(x=Location, y=Amount, size=Amount), color='#301934', fill='#301934', alpha = 0.6, show.legend = FALSE) +
  labs(x='Location',
       y='Amount of pendestrians counted', 
       title='Pedenstrian behaviour throughout the years', 
       subtitle = '{closest_state}',
       caption = 'Source: https://data.cityofsydney.nsw.gov.au/')+
  theme_minimal()+
  scale_y_log10()+ #log scale
  theme(axis.text.x = element_text(angle=90, hjust=1, size=6))+
  transition_states(states = Year)+
  shadow_wake(wake_length = 0.1, alpha=.05, colour = 'grey')

Many site are grouped within 1 single location. Time to fix the aggregation issue, grouping by sites instead of locations…

Which locations increased its amount of pedestrians the most?

weekdays_year_LOCID <- weekdays %>% 
  group_by(Location_site, Year) %>% 
  summarise(Amount=sum(TotalCount)) %>% 
  left_join(sites, by='Location_site')
ggplot(weekdays_year_LOCID) + 
  geom_col(aes(x=Amount, y=Location_site), color='#301934', fill='#301934', alpha = 0.6, show.legend = FALSE) +
  labs(x='Amount of pendestrians counted',
       y='Site (disaggregated location)', 
       title='Pedenstrian behaviour throughout the years', 
       subtitle = '{closest_state}',
       caption = 'Source: https://data.cityofsydney.nsw.gov.au/')+
  theme_minimal()+
  theme(axis.text.y = element_text(angle=0, size=5))+
  transition_states(Year, transition_length = 2, state_length = 1)+
  ease_aes('sine-in-out')

As some years contain missing information (NA values), we’ll get rid of the empty instances.

weekdays_var <- weekdays_year_LOCID %>%   
  pivot_wider(names_from = Year, values_from = Amount) %>% 
  drop_na() %>% #dropping missing years
  rowwise() %>% 
  mutate(Average = sum(c_across("2013":"2021")/8),
        STD=sd(c_across("2013":"2021")),
        CV=STD/Average, 
        Per1=sum(c_across("2013":"2016")/4),
        Per2=sum(c_across("2017":"2021")/4),
        TOT_V=Per2/Per1*100,
        VAR=case_when(
          TOT_V > 100 ~ "Increased recently",
          TOT_V < 100 ~ "Decreased recently"))
plot_ly(data=weekdays_var,
        x = ~Location_site,
        y = ~CV,
        color = ~VAR,
        type = "bar") %>% 
   layout(title = 'Pedestrian behaviour - variation over time (coeficient of variation)', xaxis = list(categoryorder = "total ascending"), 
         yaxis = list(title = 'Coefficient of variation')) 
qpal <- colorFactor(palette = c("#8da0cb", "#66c2a5"),
                               domain = c("Increased recently", "Decreased recently"))

weekdays_var %>% 
  select(Location_site, CV, TOT_V, VAR) %>% 
  left_join(sites, by='Location_site') %>% 

leaflet() %>%
  addTiles() %>%
  addProviderTiles(providers$CartoDB) %>% 
  addCircles(lng = ~long, 
             lat = ~lat, 
             radius = ~sqrt(CV) *150,
             color = ~qpal(VAR),
             popup = ~paste('<strong>',"Location:",'</strong>', Location_site, '<br/>',
                            '<strong>',"Coefient of variation:",'</strong>', round(CV,2), '<br/>',
                            '<strong>',"Recent effect:",'</strong>', round(TOT_V,2), '%'))%>% 
    addLegend("bottomright", pal = qpal, values = ~VAR,
    title = "Recent effect",
    opacity = 1)

Many locations in the CBD show a decline tendency over the last few years. This could possibly be a consequence of the pandemics.

Cleveland street shows important changes in pedestrian trend, with a recent drop in its amount.

King street, between Enmore Rd. and Sydney Park shows an increase in rates, same as Oxford Street and Moore Park Road.



Thanks for reading!